95  Odds Ratios

95.1 Introduction

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B.

It is used extensively in epidemiology and is defined as the ratio of the odds of A occurring in the presence of B to the odds of A occurring in the absence of B. For example, the odds of death occurring in the presence of smoking vs. the odds of death occurring in the absence of smoking.

In the following tutorial, we’ll calculate the odds ratio using R with a hypothetical dataset.

We’ll assume we have a dataset of patients, some of whom have been exposed to a certain treatment. We’re interested in whether the treatment is associated with recovery.

95.1.1 Step 1: Create the dataset

First, we create a contingency table of treatment exposure and patient recovery.

Show code
# Define the counts of recovery vs. no recovery for both treatment and control groups
treatment_recovered <- 60    # Patients recovered with treatment
treatment_not_recovered <- 40 # Patients not recovered with treatment
control_recovered <- 30       # Patients recovered without treatment
control_not_recovered <- 70   # Patients not recovered without treatment

# Create a matrix to represent this data
data_matrix <- matrix(c(treatment_recovered, treatment_not_recovered,
                        control_recovered, control_not_recovered),
                      nrow = 2, byrow = TRUE,
                      dimnames = list(c("Treatment", "Control"),
                                      c("Recovered", "Not_Recovered")))

# Look at the matrix
data_matrix
          Recovered Not_Recovered
Treatment        60            40
Control          30            70

95.1.2 Step 2: Calculate the odds ratio

We can now calculate the odds ratio.

The odds of recovery for the treatment group is treatment_recovered / treatment_not_recovered, and for the control group, it’s control_recovered / control_not_recovered.

The OR is the ratio of these two odds.

Show code
# Calculate the Odds Ratio manually
treatment_odds <- treatment_recovered / treatment_not_recovered
control_odds <- control_recovered / control_not_recovered
odds_ratio <- treatment_odds / control_odds

# Print the Odds Ratio
odds_ratio
[1] 3.5

95.1.3 Step 3: Calculate the Odds Ratio Using a Predefined Function

R has built-in functions to calculate the odds ratio, such as using the fisher.test function for a Fisher’s Exact Test, which is suitable for small sample sizes.

Show code
# Calculate the Odds Ratio using Fisher's Exact Test
fisher_result <- fisher.test(data_matrix)

# The odds ratio is given in the result, along with the confidence interval
fisher_odds_ratio <- fisher_result$estimate
conf_int <- fisher_result$conf.int

# Print the results
fisher_odds_ratio
odds ratio 
  3.476642 
Show code
conf_int
[1] 1.872893 6.566896
attr(,"conf.level")
[1] 0.95

95.1.4 Step 4: Interpret the Results

When interpreting the odds ratio:

  • An OR of 1 suggests no association between the treatment and recovery.

  • An OR greater than 1 suggests an increased odds of recovery associated with the treatment.

  • An OR less than 1 suggests a decreased odds of recovery associated with the treatment.

95.1.5 Step 5: Calculate Confidence Intervals

Confidence intervals provide a range of values within which the true odds ratio is expected to lie, with a certain level of confidence (typically 95%).

Show code
# Extracting the confidence interval from the Fisher's Exact Test
lower_ci <- conf_int[1]
upper_ci <- conf_int[2]

# Printing the confidence interval
cat("95% CI for OR: [", lower_ci, ",", upper_ci, "]", "\n")
95% CI for OR: [ 1.872893 , 6.566896 ] 

Remember, an odds ratio does not imply causation and should be interpreted with caution, especially with observational data.